首页> 外文OA文献 >Semi-supervised Bootstrapping approach for Named Entity Recognition
【2h】

Semi-supervised Bootstrapping approach for Named Entity Recognition

机译:命名实体识别的半监督Bootstrapping方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The aim of Named Entity Recognition (NER) is to identify references of namedentities in unstructured documents, and to classify them into pre-definedsemantic categories. NER often aids from added background knowledge in the formof gazetteers. However using such a collection does not deal with name variantsand cannot resolve ambiguities associated in identifying the entities incontext and associating them with predefined categories. We present asemi-supervised NER approach that starts with identifying named entities with asmall set of training data. Using the identified named entities, the word andthe context features are used to define the pattern. This pattern of each namedentity category is used as a seed pattern to identify the named entities in thetest set. Pattern scoring and tuple value score enables the generation of thenew patterns to identify the named entity categories. We have evaluated theproposed system for English language with the dataset of tagged (IEER) anduntagged (CoNLL 2003) named entity corpus and for Tamil language with thedocuments from the FIRE corpus and yield an average f-measure of 75% for boththe languages.
机译:命名实体识别(NER)的目的是识别非结构化文档中命名实体的引用,并将其分类为预定义的语义类别。 NER通常以地名词典的形式从附加的背景知识中获得帮助。但是,使用这样的集合不能处理名称变体,也不能解决在上下文中识别实体并将其与预定义类别相关联的歧义。我们提出了半监督式NER方法,该方法从识别带有少量训练数据的命名实体开始。使用标识的命名实体,单词和上下文特征可用于定义模式。每个命名实体类别的此模式都用作种子模式,以标识测试集中的命名实体。模式评分和元组值评分使能够生成新模式来标识命名实体类别。我们用标记的(IEER)和未标记的(CoNLL 2003)数据集命名实体语料库评估了拟议的英语系统,并使用FIRE语料库的文档评估了泰米尔语的拟议系统,这两种语言的平均f测度为75%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号